Complete Chloroplast Genomes of Saussurea katochaete, Saussurea superba, and Saussurea stella: Genome Structures and Comparative and Phylogenetic Analyses

Saussurea plants are widely distributed in Asia and Europe; however, their complex phylogenetic relationships have led to many difficulties in phylogenetic studies and interspecific identification. In this study, we assembled, annotated, and analyzed the chloroplast genomes of three Saussurea plants: Saussurea katochaete, Saussurea superba, and Saussurea stella. The results showed that the full-length sequences of the three Saussurea plants were 152,561 bp, 151,452 bp, and 152,293 bp, respectively, which represent the typical quadripartite structure, and the genomes were relatively conserved. The gene annotation results showed that the chloroplast genomes of S. katochaete, S. superba, and S. stella were annotated with 128, 124, and 127 unique genes, respectively, which included 83, 80, and 83 protein-coding genes (PCGs), respectively, 37, 36, and 36 tRNA genes, respectively, and 8 rRNA genes. Moreover, 46, 45, and 43 SSR loci, respectively, and nine highly variable regions (rpl32-trnL-UAG, rpl32, ndhF-rpl32, ycf1, trnC-GCA-petN, trnC-GCA, rpcL, psbE-petL, and rpl16-trnG-UUG) were identified and could be used as potential molecular markers for population identification and phylogenetic study of Saussurea plants. Phylogenetic analyses strongly support the sisterhood of S. katochaete with S. superba and S. stella, and are all clustered with S. depsagensis, S. inversa, S. medusa, and S. gossipihora, of which S. gossipiphora is most closely related. Additionally, the phylogenetic results indicate a high frequency of differentiation among different species of Saussurea plants, and many different species or genera are morphologically very different from each other, which may be related to certain genetic material in the chloroplasts. This study provides an important reference for the identification of Saussurea plants and studies their evolution and phylogenetics.


Introduction
Saussurea originated in the early-middle Miocene within the Hengduan Mountains [1], and is an annual, biennial, or perennial herb of the Trib.Cynarea of the Compositae.There are more than 400 species worldwide, mainly in Asia and Europe, with about 264 species (66%) found in China, mainly in areas at altitudes of 400-5000 m [2].Due to the wide distribution of the genus, the variety of species, and the unclear morphological differentiation between species, the identification of species is difficult [3,4].
More than 10 species of this genus have been used to treat bruises and sprains, altitude sickness, and food poisoning.In the last two decades, it has been found that the chemical constituents of the Saussurea plant mainly include compounds such as steroids, phenylpropanoids, flavonoids, sesquiterpenoids, and triterpenoids, which are known to possess biological activities such as antitumor, antibacterial, anti-inflammatory, and cardiotonic activities [5,6].Because of its high medicinal value and because wild resources are not yet available for large-scale application, artificial cultivation is one of the future development trends.However, Saussurea plants are distributed in harsh environments, coupled with their controversial population identification [7] making accurate species identification one of the prerequisites for achieving cultivation.
Previous studies have shown that the plant chloroplast genome, which together with the central nuclear genome and the mitochondrial genome constitutes a complete plant cell, is an effective way to achieve species identification [8,9].In addition, chloroplasts, like the mitochondria, are double-membrane-bound cytoplasmic organelles whose origin can be traced back at least one billion years to the prokaryotic organisms of a photosynthetic cyanobacterium [9][10][11].Over a long period of evolution, chloroplasts have become indispensable organelles to allow green plants to carry out life activities such as photosynthesis and carbon fixation [12].They are involved in the synthesis of fatty acids and amino acids in the plant body [13], and they possess a set of structurally complete genomes [14], which usually does not exceed 160 kb, including a large single-copy region, a small single-copy region, and two inverted repeat regions, [15].Compared with the nuclear and mitochondrial genomes, they have the advantages of easy discernment of their age characteristics, easy access to sequences, a more conserved gene composition and structure, rich genetic information, and high species discrimination [9,16].These characteristics are of great significance for the identification of Saussurea plant species, which have complex germplasm relationships and a rich diversity.
S. katochaete and S. stella are both perennial stemless rosette-like herbs.In China, they are mainly distributed in hillside meadows, valley marshes, river meadows, alpine meadows, and hillside scrub meadows in Gansu, Qinghai, Sichuan, Yunnan, and other provinces at 2230-4700 m and 2000-5400 m, respectively.S. superba is a perennial herb that is mainly found in the sandy river valleys of the Gansu and Qinghai Provinces in China.Previous studies on Saussurea plants have focused on resource investigations [17], karyotyping [7,18], and chemical composition analyses [5], but chloroplast genomic studies on S. katochaete, S. superba, and S. stella have not been reported.Therefore, in this study, we assembled, annotated, and analyzed the chloroplast genomes of S. katochaete, S. superba, and S. stella using second-and third-generation high-throughput sequencing technologies and revealed their phylogenetic relationships.The results are expected to provide some references for the identification of populations, the identification of germplasm resources, and phylogenetic studies in the field of Saussurea plants.

Characterization of the Saussurea Chloroplast Genome
The full-length sequences of the chloroplast genomes of the three Saussurea plants were 152,586 bp (S. katochaete), 152,490 bp (S. superba), and 152,442 bp (S. stella), respectively.Additionally, they were double-stranded loops with a typical quadripartite structure, as in most herbaceous plants.Two inverted repeat regions (IRs) separate the large single copy (LSC) and small single copy (SSC) regions (Figures 1-3).In the IR regions of S. katochaete, S. superba, and S. stella, the IRa lengths were found to be 25,193 bp, 25,193 bp, and 25,201 bp, and the IRb lengths were 25,193 bp, 25,193 bp, and 25,201 bp, respectively.The lengths of the LSC regions were 83,551 bp, 83,460 bp, and 83,457 bp.The SSC region lengths were 18,649 bp, 18,644 bp, and 18,583 bp, respectively (Table 1).The GC contents of the three Saussurea plants were similar (37.68-37.69%),and the GC contents of the IR region (43.12-44.28%)were higher than those of the LSC region (35.80-35.94%)and the SSC region (31.39-31.40%)(Table 1).This might be related to the higher GC-rich rRNA and tRNA gene contents [14,19].The AT content (62.31-62.32%)was greater than the GC content (37.68-37.69%),indicating the same characteristics as other plant chloroplast genomes, i.e., the AT bias was obvious [20].content (37.68-37.69%),indicating the same characteristics as other plant chloroplast genomes, i.e., the AT bias was obvious [20].

Gene Annotation and Categorization Analysis
The results of the three Saussurea plants showed that the chloroplast genome of S. katochaete contains 128 genes, of which eighty-three are PCGs, eight are rRNA genes, and thirty-seven are tRNA genes (Table 2).The chloroplast genome of S. superba contains 124 genes, of which eighty are PCGs, eight are rRNA genes, and thirty-six are tRNA genes.The chloroplast genome of S. stella contains 127 genes, of which eighty-three are PCGs, eight are rRNA genes, and thirty-six are tRNA genes.The genes can be classified into four main categories based on their functions.The first category is the class of genes related to photosynthesis, including photosystem I, photosystem II, the cytochrome b/f complex, ATP synthase, and NADH dehydrogenase.The second category is the category of genes associated with one's own inheritance, including the ribosomal protein class (SSU), ribosomal protein (LSU), RNA polymerase, RubisCO large subunit, transfer RNAs, and ribosomal RNAs.The third category contains genes associated with other syntheses, including protease, maturase, and translational initiation factors.The fourth category contains gene types with unknown functions.In addition, a total of 22 intron-containing genes were identified in the three Saussurea plants (Tables 3-5, Figure 4).Among these intron-containing genes, 18 genes (atpF, clpP, ndhA, ndhB, petB, petD, rpl16, rpl2, rpl2, rpoC1, rps16, trnA-UGC, trnA-UGC, trnI-GAU, trnI-GAU, trnK-UUU, trnL-UAA, and trnS-CGA) contain one intron, including thirteen PCGs and five tRNA genes, and two genes (clpP and ycf3) contain two introns.Two rps12 genes are trans-splicing genes.

Long Repeat Sequences and SSR Analyses
Long repetitive sequences comprise three types: forward (F), palindromic (P), and tandem (T) repeats.They may function to promote chloroplast genome rearrangements and can increase the population's genetic diversity.A total of 90 unique long repetitive sequences were detected in the chloroplast genomes of three plants of the genus Saussurea (Table 6), of which 25 pairs of long repetitive sequences were found in S. katochaete, including one forward and twenty-four palindromic repetitions.Thirty-six and 29 pairs of long repetitive sequences were found in S. superba, and S. stella had 36 and 29 pairs of long repeat sequences, respectively, all of which were palindromic repeats (Table 6).The types of long repetitive sequences found in the chloroplast genomes of the three plant species of the genus Saussurea are almost all palindromic repeats with no tandem repeats, except for S. katochaete, which contains one forward repeat.Of these repeats, the shortest is only 30 bp in length, while the longest is as large as 4344 bp.The results indicate that the number, distribution, and length of the long repetitive sequences in the chloroplast genome of Saussurea plants are heterogeneous.

Long Repeat Sequences and SSR Analyses
Long repetitive sequences comprise three types: forward (F), palindromic (P), and tandem (T) repeats.They may function to promote chloroplast genome rearrangements and can increase the population's genetic diversity.A total of 90 unique long repetitive sequences were detected in the chloroplast genomes of three plants of the genus Saussurea (Table 6), of which 25 pairs of long repetitive sequences were found in S. katochaete, in-  "F" for forward repeat, and "P" for palindromic repeat.
SSRs (Simple Sequence Repeats), i.e., Simple Repeat Sequences refer to a segment of DNA in the genome consisting of basic units of one to six nucleotides that are repeated many times.They are widely distributed among different locations of the genome.In addition, they are widely used as molecular markers in species identification and phylogenetic studies [19].The annotation results showed that a total of 908 SSRs were identified, with the numbers of SSRs for S. katochaete, S. superba, and S. stella being 305, 303, and 300, respectively (Table 7).The average number of SSRs was about 303, with the highest number of SSRs being found in S. katochaete and the lowest number of SSRs being found in S. stella.Among them, 46, 45, and 43 SSR loci were identified in three Saussurea plants (Tables 8-10).Most of the SSR loci were located in the LSC region (37, 37, and 34 loci) of the Saussurea chloroplast genome, whereas relatively few SSR loci were located in the SSC region (4, 4, and 4 loci) and the IR region (5, 5, and 5 loci).In addition, the six types of SSRs identified were mononucleotide (36.12%), dinucleotide (50%), trinucleotide (5.07%), tetranucleotide (6.72%), pentanucleotide (1.32%), and hexanucleotide (0.77%).Dinucleotide and mononucleotide were the most predominant types of SSRs.The five major repetitive sequence types were A/T, AA/TT, AAA/TTT, AAAA/TTTT, and AAAAA/TTTTT.

Codon Usage Bias
The PCGs in the chloroplast genomes of S. katochaete, S. superba, and S. stella contained 21,408, 18,341, and 18,986 codons, respectively (Table 11).Of the amino acids encoded, the most encoded amino acid was leucine (Leu), which accounted for 10.49%, 10.33%, and 10.23% of all encoded amino acids, respectively, in the three plant types.The amino acid that was least encoded was cysteine (Cys), which accounted for only 1.07%, 1.03%, and 1.07% of all encoded amino acids, respectively, in the three plant types.This result is similar to that of a previous study on herbaceous plants [21].Based on the RSCU values, the number of preferred codons in the chloroplast genomes of all three Saussurea plants was found to be 30 (RSCU > 1), and the number of non-preferred codons was found to be 32 (RSCU < 1).Among these, the 29 preferred codons, except for the UUG, which ended in G/C, ended in A/U.This indicates that the codons ending in the base of A/U are preferred codons, and codons ending in G/C are non-preferred codons [22], a result that may be related to the adaptive evolution of the plant chloroplast genomes [23].This is in contrast to the non-preferred codons (RSCU < 1), which mostly end in G/C, suggesting that these types of codons occur less frequently in the Saussurea chloroplast genome.In addition, methionine (Met) and tryptophan (Trp) were encoded by only one codon, and both RSCUs were 1.The number of codons encoded by the rest of the codons ranged from two to six, and the results were in agreement with the findings of Chong et al. and Shi et al. [19,24].The three termination codons detected were UAA, UAG, and UGA, with UAA being the most frequent, i.e., the termination codons were biased towards UAA.

IR Expansion and Contraction
In this study, we comparatively analyzed the boundaries of the IR region of the chloroplast genomes of S. katochaete, S. superba, and S. stella with those of seven closely related species: Saussurea phaeantha, Saussurea sutchuenensis, Saussurea apus, Saussurea depsangensis, Saussurea gossipiphora, Saussurea bullockii, and Saussurea leucophylla (Figure 5).The results showed that the lengths of the chloroplast genomes of the 10 Saussurea plants range from 152,270 (S. gossipiphora) to 152,586 (S. katochaete) bp.The length of the IR region was found to be 25,185 (S. bullockii) to 25,202 (S. gossipiphora) bp.The length of the LSC region was found to be 83,344 (S. gossipiphora) to 83,551 (S. katochaete) bp, and the length of the SSC region was found to be 18,522 (S. gossipiphora) to 18,690 (S. leucophylla) bp.The chloroplast genes of the 10 species differed in length by 316 bp.The LSC region differed by 207 bp, the SSC region differed by 168 bp, and the IR region differed by 77 bp.The IR showed a high degree of conservatism with the LSC and SSC boundaries, and genes spanning or close to the boundaries of the IR and SC regions mainly included rps19, rpl22, rpl2, ycf1, ndhF, and trnH.Among them, ndhF, which is involved in photosynthesis, is located across the JSB boundary, 2-15 bp away from the JSB boundary.The ycf1 gene crosses the JSA boundary and extends from 5222 (S. depsangensis) to 5300 (S. leucopphylla) bp towards the Ira region and from 18,522 (S. gossipiphora) to 18,690 (S. leucophylla) bp from the IRa region towards the SSC region.The rps19 gene extends 6 bp towards the IRb region.The JLA boundary is located between rpl2 and trnH.

Comparative Genome Analyses of Saussurea Species
To understand the degree of sequence similarity between the chloroplast genomes of S. katochaete, S. superba, and S. stella and the other seven close relatives to Saussurea, the S. phaeantha chloroplast genome was used as the reference sequence for a full sequence comparison and analysis.The results show that the chloroplast genome sequences of the 10 species exhibit high levels of similarity or highly conserved genes, and no obvious gene rearrangements were found (Supplementary Figure S1, Figure 6).In comparison, the sequences of the LSC and SSC regions were found to be more variable than those of the IR regions (Figure 7), and the sequences of non-coding regions were more variable than those of the coding regions.Among these, several loci, such as TRN-GCA-petN, rpl32-trnL-UAG, and ycf1, showed lower levels of similarity.

Analysis of the Nucleotide Diversity
A total of 1112 variable (polymorphic) sites were identified in 151,485 nucleotide sites, including 694 single variable sites (SVS) and 418 parsimony information sites (PIS).Two different types were observed under SVS: 690 for two variable sites (SV2V) and four for three variable sites (SV3V).Similarly, there were two types of PIS: four loci with two variants (PIS2V) and seven loci with three variants (PIS3V).In addition, to quantify the level of nucleotide polymorphisms, the chloroplast genomes of 10 Saussurea plant species were compared and analyzed using DNAsp software(Version: v.5.10.01).The results showed that the Pi values of nucleotide polymorphisms in the chloroplast genomes of the 10 Saussurea species varied from 0 to 0.00911, with a mean value of 0.00200.At least nine highly variable regions are included in the Saussurea chloroplast genomes, i.e., rpl32-trnL-UAG, rpl32, ndhF-rpl32, ycf1, trnC-GCA-petN, trnC-GCA, rpcL, psbE-petL, and rpl16-trnG-UUG.These genes or gene spacer regions have high variability (Pi > 0.007).The highest values of nucleotide variation (Pi) found were 0.00911, 0.00828, 0.00806, 0.00786, 0.00753, 0.00719, 0.00719, 0.00714, and 0.00711 (Figure 8).These can be used as a potential DNA barcode for Saussurea species.
sults showed that the lengths of the chloroplast genomes of the 10 Saussurea plants range from 152,270 (S. gossipiphora) to 152,586 (S. katochaete) bp.The length of the IR region was found to be 25,185 (S. bullockii) to 25,202 (S. gossipiphora) bp.The length of the LSC region was found to be 83,344 (S. gossipiphora) to 83,551 (S. katochaete) bp, and the length of the SSC region was found to be 18,522 (S. gossipiphora) to 18,690 (S. leucophylla) bp.The chloroplast genes of the 10 species differed in length by 316 bp.The LSC region differed by 207 bp, the SSC region differed by 168 bp, and the IR region differed by 77 bp.The IR showed a high degree of conservatism with the LSC and SSC boundaries, and genes spanning or close to the boundaries of the IR and SC regions mainly included rps19, rpl22, rpl2, ycf1, ndhF, and trnH.Among them, ndhF, which is involved in photosynthesis, is located across the JSB boundary, 2-15 bp away from the JSB boundary.The ycf1 gene crosses the JSA boundary and extends from 5222 (S. depsangensis) to 5300 (S. leucopphylla) bp towards the Ira region and from 18,522 (S. gossipiphora) to 18,690 (S. leucophylla) bp from the IRa region towards the SSC region.The rps19 gene extends 6 bp towards the IRb region.The JLA boundary is located between rpl2 and trnH.

Phylogenetic Analysis
The phylogenetic trees constructed based on both the ML and BI methods are based on agreement (Figure 9).Twenty-nine species of the Trib.Cynareae clustered in the same large branch, except for the exotic taxa.The two Carlininae plants, Atractylodes lancea and Carlina acaulis, located at the base of the evolutionary tree, clustered into one branch.Secondly, Carduus crispus, Cirsium japonicum, and Cynara cardunculus of the Carduinae clustered into one branch.Centaurea cyrdunculus of Centaureinae clustered into one branch, while Carthamus tinctorius of Centaureinae and Arctium lappa of Carduinae, as well as Dolomiaea calophylla of Carduinae, grouped together in a single unit, distinguishing S. albifolia of Carduinae from 19 other plants of the same genus.In the present study, S. stella clustered with S. depsagensis, S. inversa, S. medusa, and S. gossipihora, while S. katochaete and S. superba clustered together in a single unit that was closest in kinship to S. gossipiphora.It can be seen that S. katochaete and S. superba are closely related, and both are relatively distantly related to S. stella.
To understand the degree of sequence similarity between the chloroplast genomes of S. katochaete, S. superba, and S. stella and the other seven close relatives to Saussurea, the S. phaeantha chloroplast genome was used as the reference sequence for a full sequence comparison and analysis.The results show that the chloroplast genome sequences of the 10 species exhibit high levels of similarity or highly conserved genes, and no obvious gene rearrangements were found (Supplementary Figure S1, Figure 6).In comparison, the sequences of the LSC and SSC regions were found to be more variable than those of the IR regions (Figure 7), and the sequences of non-coding regions were more variable than those of the coding regions.Among these, several loci, such as TRN-GCA-petN, rpl32-trnL-UAG, and ycf1, showed lower levels of similarity.From the outside to the inside, they represent S. phaeantha, S. sutchuenensis, S. bullockii, S. leucophylla, S. gossipiphora, S. depsangensis, S. apus, S. stella and S. superba

Analysis of the Nucleotide Diversity
A total of 1112 variable (polymorphic) sites were identified in 151,485 nucleotide sites, including 694 single variable sites (SVS) and 418 parsimony information sites (PIS).Two different types were observed under SVS: 690 for two variable sites (SV2V) and four for three variable sites (SV3V).Similarly, there were two types of PIS: four loci with two variants (PIS2V) and seven loci with three variants (PIS3V).In addition, to quantify the level of nucleotide polymorphisms, the chloroplast genomes of 10 Saussurea plant species were compared and analyzed using DNAsp software(Version: v.5.10.01).The results showed that the Pi values of nucleotide polymorphisms in the chloroplast genomes of the 10 Saussurea species varied from 0 to 0.00911, with a mean value of 0.00200.At least nine highly variable regions are included in the Saussurea chloroplast genomes, i.e., rpl32-trnL-UAG, rpl32, ndhF-rpl32, ycf1, trnC-GCA-petN, trnC-GCA, rpcL, psbE-petL, and rpl16-trnG-Figure 7. The BLAST Atlas results of chloroplast genomes of 10 Saussurea species.The CDS regions in the reference genome were BLASTed against the CDS regions in the query genomes, and the top hits were rendered in a genome map using GView.The purple and black wave charts represent the GC content and skew.The innermost slot on the map (orange) shows the CDS regions on the reference genome.The reference was S. katochaete.From the outside to the inside, they represent S. phaeantha, S. sutchuenensis, S. bullockii, S. leucophylla, S. gossipiphora, S. depsangensis, S. apus, S. stella and S. superba.UUG.These genes or gene spacer regions have high variability (Pi > 0.007).The highest values of nucleotide variation (Pi) found were 0.00911, 0.00828, 0.00806, 0.00786, 0.00753, 0.00719, 0.00719, 0.00714, and 0.00711 (Figure 8).These can be used as a potential DNA barcode for Saussurea species.

Phylogenetic Analysis
The phylogenetic trees constructed based on both the ML and BI methods are based on agreement (Figure 9).Twenty-nine species of the Trib.Cynareae clustered in the same large branch, except for the exotic taxa.The two Carlininae plants, Atractylodes lancea and

Chloroplast Genomic Characteristics and Sequence Variation in the Three Saussurea Species
In this study, the complete chloroplast genomes of three species of Saussurea-S.katochaete, S. superba, and S. stella.They were found to be similar to those of other herbaceous plants with typical double-stranded circular tetramer structures [16,25] and sizes of 152,561 bp, 151,452 bp, and 152,293 bp, respectively.Changes in the size of plant chloroplast genomes are mainly affected by the expansion and contraction of IR regions, variation in spacer regions, and gene loss.At the same time, these expansion and contraction events cause sequence disruptions of genes located at the edges of IR regions, ultimately resulting in the formation of pseudogenes.These pseudogenes do not participate in the protein-coding process, and their products are non-essential in the organism; hence, they are also known as non-functional genes [24,26,27].However, some other studies have found that the nucleotide sequences of the pseudogenes are well-preserved, and therefore, they may not be non-functional genes, as traditionally perceived [28].In this study, 124-128 genes were identified in the chloroplast genomes of three Saussurea plants.Their protein-coding genes, tRNA genes, rRNA genes, and intron genes showed extremely similar characteristics in terms of number, which explains, to some extent, why they have similar epimorphologies.Among them, two shared pseudogenes, rps19 and ycf1, were also identified in the gene sequence.These are commonly found in the chloroplast genomes of

Chloroplast Genomic Characteristics and Sequence Variation in the Three Saussurea Species
In this study, the complete chloroplast genomes of three species of Saussurea-S.katochaete, S. superba, and S. stella.They were found to be similar to those of other herbaceous plants with typical double-stranded circular tetramer structures [16,25] and sizes of 152,561 bp, 151,452 bp, and 152,293 bp, respectively.Changes in the size of plant chloroplast genomes are mainly affected by the expansion and contraction of IR regions, variation in spacer regions, and gene loss.At the same time, these expansion and contraction events cause sequence disruptions of genes located at the edges of IR regions, ultimately resulting in the formation of pseudogenes.These pseudogenes do not participate in the protein-coding process, and their products are non-essential in the organism; hence, they are also known as non-functional genes [24,26,27].However, some other studies have found that the nucleotide sequences of the pseudogenes are well-preserved, and therefore, they may not be non-functional genes, as traditionally perceived [28].In this study, 124-128 genes were identified in the chloroplast genomes of three Saussurea plants.Their protein-coding genes, tRNA genes, rRNA genes, and intron genes showed extremely similar characteristics in terms of number, which explains, to some extent, why they have similar epimorphologies.Among them, two shared pseudogenes, rps19 and ycf1, were also identified in the gene sequence.These are commonly found in the chloro-plast genomes of many angiosperms and herbaceous plants, such as Lycium (Solanaceae) [29], Allium chrysanthum (Amaryllidaceae J.St.-Hil.)[30], and C. cardunculus (Asteraceae) [31], among others.In recent years, it has been found that the ycf1 gene may be involved in the process of photosynthesis and in responses to environmental changes in plants [32].It is a key DNA barcode in plants [23], but expansion and contraction of the IR region may also lead to its fragmentation [26].In addition, the contraction and expansion of the IR region result in the loss of the ycf2 gene in S. superba and S. stella, suggesting that genetic differentiation may occur in the region of loss in Saussurea species and may also inhibit chloroplast genome enlargement in the Saussurea species to a certain extent [24,33].
A total of 22 intron-containing genes were identified in three plants of the genus Saussurea.These included 18 genes with one intron, two genes with two introns, and two trans-shear genes.These intron-containing genes can play important roles in regulating gene expression and improving agronomic traits in plants [34].
The determination of long repeated sequences and Simple Sequence Repeats (SSRs) is closely related to the genetic diversity of plants and the identification of molecular markers for germplasm resources [35].Long repetitive sequences are usually found in the intergenic spacer (IGS) and intronic regions of plant chloroplast genomes [36].In this study, the repetitive sequences observed in the chloroplast genomes of the three Saussurea plant species showed a very heterogeneous phenomenon in terms of the number, type, and length.With palindromic (P) repeats occurring at a very high frequency relative to forward (F) repeats, tandem (T) repeats were not detected in any of the three Saussurea plants, suggesting that there may be a certain degree of difference in the mutation frequency among the three species [35].These long repetitive sequences play important roles in gene recombination and sequence structure variation [23] and can serve as potential indicators of differential identification between Saussurea species.SSRs are a class of short repetitive sequences, or Simple Sequence Repeats, found in the chloroplast genome.They are also known as microsatellites.They are prevalent in eukaryotes and prokaryotes and are closely related to gene expression and regulation [37].These DNA sequences are widely involved in a wide range of life processes in plant cells, and because of their high level of polymorphism, they are often used in the fields of genetic diversity, species identification, and the development of molecular markers [21,23].The three Saussurea plants included in this study, S. katochaete, S. superba, and S. stella, were identified as having 46, 45, and 43 SSR loci, respectively (Tables 9-11).Six types of SSRs were included, i.e., mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide, which are mostly located in the non-coding region.Mononucleotide (A/T) is the predominant type of repeat, a similar result to studies on the predominance of A/T in the base composition of SSRs in other herbaceous plants, such as Thalictrum cirrhosum (Ranunculaceae) [38] and Themeda japonica (Gramineae) [39].This A/T-dominant repeat type is thought to be a widespread phenomenon in the chloroplast genomes of higher plants, and the fact that preferred codons predominantly end in A/U causes this bias.Previous studies have suggested that this may be related to natural selection and genetic mutation [40].In addition, we identified nine highly variable genes or intergenic regions (rpl32-trnL-UAG, rpl32, ndhF-rpl32, ycf1, trnC-GCA-petN, trnC-GCA, rpcL, psbE-petL, and rpl16-trnG-UUG), and it is clear that the sequence variability is significantly higher in the LSC region and the SSC region than in the IRs.This may be caused by certain duplicated genes within the IR region preventing mutations from occurring [40].Some of these rare SSR loci as well as highly variable regions can be used as potential molecular markers in the fields of intraspecific genetic variation, population identification, species evolution, and phylogenetic studies in plants, e.g., rps8, rpl16, PsbE-petL, and ndhF-rpl32 have been demonstrated to be useful in Phoenix dancong (Camellia sinensis) [41], Oryza sativa (Gramineae) [42], and some Lauraceae plants (Machilus yunnanensis, Machilus balansae) [43] in terms of population identification and phylogenetic analyses, demonstrating high levels of resolution.
Codon preferences in plant chloroplast genomes can reveal phylogenetic relationships across species or within the same species, and mutation and natural selection are closely linked to codon preferences in genes [44].For example, the codon preferences of some Euphorbiaceae [45] plants are mainly influenced by natural selection, and those of Oncidium Gower Ramsey [46], an ornamental flower, are mainly influenced by mutations.It has also been suggested that the factors affecting codon preferences in the chloroplast genome are complex and diverse and are not solely influenced by a single natural selection or mutation [47].In this study, codons ending in A/U were dominant in the chloroplast genomes of the three Saussurea plants, and the RSCU values were close to each other, with the number of preferred codons being 30 (RSCU > 1) and the number of non-preferred codons being 32 (RSCU < 1).This suggests that the chloroplast genomes of Saussurea plants are relatively conserved, and they have formed a unique codon use system that distinguishes them from other species during the evolutionary and developmental processes.In addition, the three Saussurea species showed a higher preference for the codon UUA, which encodes an amino acid type of Leu, than for other codons with RSCUs as high as 1.95, 2.1, and 2.05, respectively.This suggests that the trnL-UAA gene may have exerted an important influence on the evolutionary development of Saussurea species.

Phylogenetic Analyses in Saussurea Species
Although the chloroplast genome dataset is still considered to have many shortcomings, the gene sequences are still important for revealing phylogenetic relationships between or within species because they include many important information sites [24,33,48].The phylogenetic tree of Saussurea plants, like those of other reported species [24], has similar topological structures, suggesting that many closely related species within the genus may have originated from a common ancestor.The flanking genes of the JLB, JSB, JSA, and JLA borders of the three species in this study, S. katochaete, S. superba, and S. stella, were the same, being the rpl22, rps19, rpl2, ndhF, ycf1, and psbA genes.With the exception of minor differences in the amplification lengths of the genes at the JSB boundary, the amplification lengths were consistent across the remaining boundaries, suggesting close phylogenetic relationships among these three Saussurea species.After they diverged from a common ancestor, S. stella, S. depsagensis, S. inversa, S. medusa, and S. gossipihora clustered together in a separate monophyletic group.Additionally, S. katochaete and S. superba evolved into a sisterhood.Obviously, S. katochaete and S. superba are more closely related compared to S. stella.
Saussurea was first established in 1810, and it is distributed in Asia and Europe.In China, it is mainly distributed in the high-altitude areas of southwest and northwest China.Due to the wide distribution of the genus, the great variety of species, and the unclear morphological differentiation among species, coupled with the fact that the classification of the genus is mainly based on phenotypic characteristics, such as the subtending leaf of capitulate, etc., its taxonomic study and interspecific identification have been controversial [4,49].Since the introduction of Mendel's laws of inheritance in 1865, researchers have generally believed that the evolutionary process of plants can be fully explained by applying Mendel's laws of inheritance.However, studies on some higher plants proved in 1909 that certain variable traits in plant evolution may be closely related to chloroplasts rather than being caused by Mendelian modes of inheritance [50].It was not until the 1960s that Sager and Ishida's studies further confirmed that there is indeed a unique class of genetic material in the plant chloroplast genome that is capable of influencing certain traits in plants [9].Phylogenetic analyses of Saussurea plants in the present study revealed a high frequency of Saussurea species differentiation among different species and great morphological differences among many different species, which may also be related to certain genetic materials in the chloroplasts.

Genome Assembly and Annotation
Genome splicing was performed using Flye (version: v.2.9; parameters: meta-plasmids) software, and the splicing results were compared with a close reference genome using Blastn (version: 2.12.0+; parameters: evaluate 1 × 10 −5 ) based on a comparison to determine the candidate sequence assembly results.The genomic linkages of chloroplasts were determined based on the depth of sequence sequencing, the read comparison situation, and the comparison situation with closely related species.The connected sequences, if containing gaps (containing N sequences), were further hole-patched using Gapcloser (Version: 1.12) to obtain the final splicing results.
The functional annotation of the chloroplast genome included coding gene predictions and non-coding RNA annotations (rRNA and tRNA annotations).Gene annotations were performed using the chloroplast-specific annotation software CPGAVAS2 (http://47.96.24 9.172:16019/analyzer/annotate accessed on 2 August 2023).
We used vmatch (http://www.vmatch.de/accessed on 7 August 2023; parameter: minimal repeat size 30 bp) to find scattered long repeat sequence fragments in the chloroplast genome.Codon usage bias, or relative synonymous codon usage (RSCU), is an assessment of the preference for the use of synonymous codons.The value is equal to the ratio of the actual observed value of synonymous codons to the average expected value of synonymous codon usage.If there is no preference for codon usage, the RSCU value is 1; if the codon is used more frequently than other synonymous codons, its RSCU value is greater than 1; if the opposite is true, the RSCU value is less than 1.The frequency of relative synonymous codon usage was statistically estimated for codons greater than 300 in length and with "ATG", "TTG", "CTG", "ATT", "ATC", "GTG", and "ATA" as start codons.With "ATT", "ATC", "GTG", and "ATA" as start codons, the use of "TGA", "TAG", and "TAA" as stop codons was analyzed for codon usage bias using CodonW (Version: 1.4.4).Sequences were analyzed for codon preference using CodonW (Version: 1.4.4).

Genome Comparison and Nucleotide Variation Analysis
The LSC, SSC, and IR region boundaries of the Saussurea chloroplast gene regions and adjacent genes were visualized and analyzed using the IRscope online analysis tool (Amiryousefi et al., 2018).The whole sequence of the Saussurea chloroplast genome was compared and visualized using the mVISTA online tool (https://genome.lbl.gov/vista/index.shtmlaccessed on 8 August 2023) in shuffle-LAGAN mode using S. phaeantha as a reference sequence.The corresponding genome sequences were compared using MAFFT software (Version: v7.487).A chloroplast genome polymorphism analysis (nucleotide diversity, Pi) was performed using DNAsp software (Version: v.5.10.01) with a window length of 800 bp and a glide step of 200 bp.

Phylogenetic Analyses
To clarify the phylogenetic positions of S. katochaete, S. superba, and S. stella in Saussurea, 20 Saussurea plants and nine other Compositae, Trib.Cynareae plants were studied, and the Compositae, Trb.Heliantheae, Helianthus, and Helianthus annuus were used as an outgroup (Table 12).They were analyzed by a multiple sequence comparison using MAFFT software (Version: v7.487), and then IQ-TREE software (Version: v.1.6.8.) was used to construct an ML tree (maximum likelihood method) using the TVM+F+I+G4 model and a BI tree (Bayesian method) using MrBayes (Version: v. 3.2.6.) using the GTR+I+G model with 2,000,000 generations and a sampling frequency of 100.The visualization was performed using ITOL software (http://itol.embl.de/accessed on 17 August 2023).The three species included in this study are bolded.

Conclusions
In this study, we assembled, annotated, and analyzed the whole chloroplast genomes of three species of Saussurea and explored their genomic features and phylogenetic relationships with other closely related species.The results show that the three Saussurea plants, similar to other herbaceous plants, have relatively conserved genome structures, abundant SSR loci, and some highly variable genes or gene intergenic regions.This provides an important basis for the population identification and phylogenetic study of Saussurea plants.The degree of sequence variation in the LSC and SSC regions of the genome was significantly higher than that in the IR region.The trnL-UAA gene may have had an important influence on the evolutionary

Figure 1 .
Figure 1.Chloroplast genome map of S. katochaete.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 1 .
Figure 1.Chloroplast genome map of S. katochaete.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 2 .
Figure 2. Chloroplast genome map of S. superba.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 2 .
Figure 2. Chloroplast genome map of S. superba.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 3 .
Figure 3. Chloroplast genome map of S. stella.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 3 .
Figure 3. Chloroplast genome map of S. stella.The gray arrows indicate the direction of gene transcription.Genes in the inner circle are transcribed in a clockwise direction, while those in the outer circle are transcribed in a counter-clockwise direction.

Figure 5 .
Figure 5.Comparison of the boundaries of the LSC, SSC, and IR regions of the chloroplast genomes of 10 species of Saussurea.Genes around the border are shown above and below the main line.JLB, JSB, JSA, and JLA represent the junctions of LSC/IRb, IRb/SSC, SSC/IRa, and IRa/LSC, respectively.

Figure 6 .
Figure 6.Comparative genomic analyses of 10 species of Saussurea.Gray arrows indicate the gene orientation, pink indicates non-coding sequences, purple indicates exons, light blue indicates rRNAs, and white bumps indicate different regions of chloroplast genes.The x-axis delineates the aligned nucleotide positions from the S. phaeantha chloroplast genome (MT554930.1),and the y-axis displays pairwise identity percentages, spanning from 50% to 100%.

Figure 6 .
Figure 6.Comparative genomic analyses of 10 species of Saussurea.Gray arrows indicate the gene orientation, pink indicates non-coding sequences, purple indicates exons, light blue indicates rRNAs, and white bumps indicate different regions of chloroplast genes.The x-axis delineates the aligned nucleotide positions from the S. phaeantha chloroplast genome (MT554930.1),and the y-axis displays pairwise identity percentages, spanning from 50% to 100%.

Figure 7 .
Figure 7.The BLAST Atlas results of chloroplast genomes of 10 Saussurea species.The CDS regions in the reference genome were BLASTed against the CDS regions in the query genomes, and the top hits were rendered in a genome map using GView.The purple and black wave charts represent the GC content and skew.The innermost slot on the map (orange) shows the CDS regions on the reference genome.The reference was S. katochaete.From the outside to the inside, they represent S. phaeantha, S. sutchuenensis, S. bullockii, S. leucophylla, S. gossipiphora, S. depsangensis, S. apus, S. stella and S. superba

Figure 8 .
Figure 8. Analysis of the nucleotide diversity in the chloroplast genomes of 10 species of Saussurea.The X-axis and Y-axis show the positions of the midpoint of a window and the pi values, respectively.

Figure 8 .
Figure 8. Analysis of the nucleotide diversity in the chloroplast genomes of 10 species of Saussurea.The X-axis and Y-axis show the positions of the midpoint of a window and the pi values, respectively.

Figure 9 .
Figure 9. Phylogenetic analysis of the Saussurea.The red pentagrams represent the three species included in this study, and the green pentagrams represent the seven closely related species.The ML tree was reconstructed by IQ-TREE v.1.6.8.The numbers next to the branches represent ML bootstrap support values.The BI tree was reconstructed by MrBayes v. 3.2.6.The numbers next to the branches represent BI probability support values.

Figure 9 .
Figure 9. Phylogenetic analysis of the Saussurea.The red pentagrams represent the three species included in this study, and the green pentagrams represent the seven closely related species.The ML tree was reconstructed by IQ-TREE v.1.6.8.The numbers next to the branches represent ML bootstrap support values.The BI tree was reconstructed by MrBayes v. 3.2.6.The numbers next to the branches represent BI probability support values.

Table 1 .
Chloroplast genome base composition of three Saussurea species.
IRA and IRB represent two inverted repeat regions; LSC represents the large single copy region; and SSC represents the small single copy region.

Table 1 .
Chloroplast genome base composition of three Saussurea species.

Table 2 .
Chloroplast genome annotation and classification analysis of S. katochaete, S. superba, and S. stella.

Table 3 .
The lengths of the introns and exons for the splitting genes of the chloroplast genome of S. katochaete.

Table 4 .
The lengths of the introns and exons for the splitting genes of the chloroplast genome of S. superba.

Table 5 .
The lengths of the introns and exons for the splitting genes of S. stella.

Table 6 .
Statistics for the long repetitive sequences in the chloroplast genomes of S. katochaete, S. superba, and S. stella.

Table 7 .
SSR analysis of the chloroplast genomes of S. katochaete, S. superba, and S. stella.
"/" means no such repeat type.

Table 8 .
SSRs in the chloroplast genome of S. katochaete.

Table 9 .
SSRs in the chloroplast genome of S. superba.

Table 10 .
SSRs in the chloroplast genome of S. stella.

Table 11 .
Codon usage bias analysis of the chloroplast genomes of S. katochaete, S. superba, and S. stella.

Table 12 .
Plant samples used in this study.